12 research outputs found

    HCU400: An Annotated Dataset for Exploring Aural Phenomenology Through Causal Uncertainty

    Full text link
    The way we perceive a sound depends on many aspects-- its ecological frequency, acoustic features, typicality, and most notably, its identified source. In this paper, we present the HCU400: a dataset of 402 sounds ranging from easily identifiable everyday sounds to intentionally obscured artificial ones. It aims to lower the barrier for the study of aural phenomenology as the largest available audio dataset to include an analysis of causal attribution. Each sample has been annotated with crowd-sourced descriptions, as well as familiarity, imageability, arousal, and valence ratings. We extend existing calculations of causal uncertainty, automating and generalizing them with word embeddings. Upon analysis we find that individuals will provide less polarized emotion ratings as a sound's source becomes increasingly ambiguous; individual ratings of familiarity and imageability, on the other hand, diverge as uncertainty increases despite a clear negative trend on average

    Towards Improved Room Impulse Response Estimation for Speech Recognition

    Full text link
    We propose to characterize and improve the performance of blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a GAN-based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features, and uses a novel energy decay relief loss to optimize for capturing energy-based properties of the input reverberant speech. We show that our model outperforms the state-of-the-art baselines on acoustic benchmarks (by 72% on the energy decay relief and 22% on an early-reflection energy metric), as well as in an ASR evaluation task (by 6.9% in word error rate)

    System specific power reduction techniques for wearable navigation technology

    No full text
    Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (page 73).As a result of advances in computer vision, mapping, and controls, wearable technology for visually-impaired individuals has become a growing space of research within Assistive Technology. A team at the MIT Energy Ecient Circuits Group has made an important stride forward by presenting a wearable navigation prototype in a fully integrated hardware form factor, but one of biggest barriers to usability of the device is its excessive power consumption. As such, the goal of this work is, broadly, to- (1) Understand the largest sources of power consumption in the initial navigation proto- type system, and expose relevant features for control; (2) Develop a set of algorithms that can capitalize on the motion of a user, the motion of the environment around a user, and the proximity of obstacles within the environment to the user, in order to dynamically tune the exposed parameters to scale power as necessary; and (3) Lay the foundation for the next generation wearable navigation prototype by translating critical software operations and the power scaling algorithms into a hardware architecture capable of working with a smaller and less power intensive depth camera. The first portion of this work focuses on the wearable navigation prototype built around Texas Instrument's OPT9220/9221 Time of Flight chipset. Illumination voltage, frame rate, and integration duty cycle are identied as key control features, and a step rate estimation algorithm, scene statistics algorithm, and frame skipping controller to tune these features are built and tested. The latter half the work focuses on the newer OPT8320 evaluation platform, for which a Bluespec System Verilog implementation of these power algorithms and the point cloud generation operation is presented and tested. Overall, the work demonstrates the critical concept that simple, system specific, fully integrated algorithms can effectively be used to reduce analog power system-wide.by Ishwarya Ananthabhotla.M. Eng

    Cognitive Audio: Enabling Auditory Interfaces with an Understanding of How We Hear

    No full text
    Over the last several decades, neuroscientists, cognitive scientists, and psychologists have made strides in understanding the complex and mysterious processes that define the interaction between our minds and the sounds around us. Some of these processes, particularly at the lowest levels of abstraction relative to a sound wave, are well understood, and are easy to characterize across large sections of the human population; others, however, are the sum of both intuition and observations drawn from small-scale laboratory experiments, and remain as of yet poorly understood. In this thesis, I suggest that there is value in coupling insight into the workings of auditory processing, beginning with abstractions in pre-conscious processing, with new frontiers in interface design and state-of-the-art infrastructure for parsing and identifying sound objects, as a means of unlocking audio technologies that are much more immersive, naturalistic, and synergistic than those present in the existing landscape. From the vantage point of today's computational models and devices that largely represent audio at the level of the digital sample, I gesture towards a world of auditory interfaces that work deeply in concert with uniquely human tendencies, allowing us to altogether re-imagine how we capture, preserve, and experience bodies of sound -- towards, for example, augmented reality devices that manipulate sound objects to minimize distractions, lossy "codecs" that operate on semantic rather than time-frequency information, and soundscape design engines operating on large corpora of audio data that optimize for aesthetic or experiential outcomes instead of purely objective ones. To do this, I aim to introduce and explore a new research direction focused on the marriage of principles governing pre-conscious auditory cognition with traditional HCI approaches to auditory interface design via explicit statistical modeling, termed "Cognitive Audio". Along the way, I consider the major roadblocks that present themselves in approaching this convergence: I ask how we might "probe" and measure a cognitive principle of interest robustly enough to inform system design, in the absence of immediately observable biophysical phenomena that may accompany, for example, visual cognition; I also ask how we might build reliable, meaningful statistical models from the resulting data that drive compelling experiences despite inherent noise, sparsity, and generalizations made at the level of the crowd. I discuss early insights into these questions through the lens of a series of projects centered on auditory processing at different levels of abstraction. I begin with a discussion of early work focused on cognitive models of lower-level phenomena; these exercises then inform a comprehensive effort to construct general purpose estimators of gestalt concepts in sound understanding. I then demonstrate the affordances of these estimators in the context of application systems that I construct and characterize, incorporating additional explorations on methods for personalization that sit atop these estimators. Finally, I conclude with a dialogue on the intersection between the key contributions in this dissertation and a string of major themes relevant to the audio technology and computation world today.Ph.D

    Manipulating Causal Uncertainty in Sound Objects

    No full text

    Towards a Perceptual Loss: Using a Neural Network Codec Approximation as a Loss for Generative Audio Models

    No full text
    © 2019 Association for Computing Machinery. Generative audio models based on neural networks have led to considerable improvements across fields including speech enhancement, source separation, and text-to-speech synthesis. These systems are typically trained in a supervised fashion using simple element-wise ℓ1 or ℓ2 losses. However, because they do not capture properties of the human auditory system, such losses encourage modelling perceptually meaningless aspects of the output, wasting capacity and limiting performance. Additionally, while adversarial models have been employed to encourage outputs that are statistically indistinguishable from ground truth and have resulted in improvements in this regard, such losses do not need to explicitly model perception as their task; furthermore, training adversarial networks remains an unstable and slow process. In this work, we investigate an idea fundamentally rooted in psychoacoustics. We train a neural network to emulate an MP3 codec as a differentiable function. Feeding the output of a generative model through this MP3 function, we remove signal components that are perceptually irrelevant before computing a loss. To further stabilize gradient propagation, we employ intermediate layer outputs to define our loss, as found useful in image domain methods. Our experiments using an autoencoding task show an improvement over standard losses in listening tests, indicating the potential of psychoacoustically motivated models for audio generation

    Towards the prediction of perceived room acoustical similarity

    No full text
    Understanding perceived room acoustical similarity is crucial to generating perceptually optimized audio rendering algorithms that maximize the perceived quality while minimizing the computational cost. In this paper we present a perceptual study in which listeners compare dynamic binaural renderings generated from spatial room impulse responses (SRIRs) obtained in several rooms and positions and are asked to identify whether they belong to the same space. The perceptual results, together with monaural room acoustical parameters, are used to generate a prediction model that estimates the perceived similarity of two SRIRs

    Folding Angle Regulation by Curved Crease Design for Self-Assembling Origami Propellers

    No full text
    © 2015 by ASME. This paper describes a method for manufacturing complex three-dimensional curved structures by self-folding layered materials. Our main focus is to first show that the material can cope with curved crease self-folding and then to utilize the curvature to predict the folding angles. The self-folding process employs uniform heat to induce selffolding of the material and shows the successful generation of several types of propellers as a proof of concept. We further show the resulting device is functional by demonstrating its levitation in the presence of a magnetic field applied remotely
    corecore